19 research outputs found
Dynamic Facility Location via Exponential Clocks
The \emph{dynamic facility location problem} is a generalization of the
classic facility location problem proposed by Eisenstat, Mathieu, and Schabanel
to model the dynamics of evolving social/infrastructure networks. The
generalization lies in that the distance metric between clients and facilities
changes over time. This leads to a trade-off between optimizing the classic
objective function and the "stability" of the solution: there is a switching
cost charged every time a client changes the facility to which it is connected.
While the standard linear program (LP) relaxation for the classic problem
naturally extends to this problem, traditional LP-rounding techniques do not,
as they are often sensitive to small changes in the metric resulting in
frequent switches.
We present a new LP-rounding algorithm for facility location problems, which
yields the first constant approximation algorithm for the dynamic facility
location problem. Our algorithm installs competing exponential clocks on the
clients and facilities, and connect every client by the path that repeatedly
follows the smallest clock in the neighborhood. The use of exponential clocks
gives rise to several properties that distinguish our approach from previous
LP-roundings for facility location problems. In particular, we use \emph{no
clustering} and we allow clients to connect through paths of \emph{arbitrary
lengths}. In fact, the clustering-free nature of our algorithm is crucial for
applying our LP-rounding approach to the dynamic problem
Algorithms For Clustering Problems:Theoretical Guarantees and Empirical Evaluations
Clustering is a classic topic in combinatorial optimization and plays a central role in many areas, including data science and machine learning. In this thesis, we first focus on the dynamic facility location problem (i.e., the facility location problem in evolving metrics). We present a new LP-rounding algorithm for facility location problems, which yields the first constant factor approximation algorithm for the dynamic facility location problem. Our algorithm installs competing exponential clocks on clients and facilities, and connects every client by the path that repeatedly follows the smallest clock in the neighborhood. The use of exponential clocks gives rise to several properties that distinguish our approach from previous LP-roundings for facility location problems. In particular, we use \emph{no clustering} and we enable clients to connect through paths of \emph{arbitrary lengths}. In fact, the clustering-free nature of our algorithm is crucial for applying our LP-rounding approach to the dynamic problem.
Furthermore, we present both empirical and theoretical aspects of the -means problem. The best known algorithm for -means with a provable guarantee is a simple local-search heuristic that yields an approximation guarantee of , a ratio that is known to be tight with respect to such methods. We overcome this barrier by presenting a new primal-dual approach that enables us (1) to exploit the geometric structure of -means and (2) to satisfy the hard constraint that at most clusters are selected without deteriorating the approximation guarantee. Our main result is a -approximation algorithm with respect to the standard LP relaxation. Our techniques are quite general and we also show improved guarantees for the general version of -means where the underlying metric is not required to be Euclidean and for -median in Euclidean metrics.
We also improve the running time of our algorithm to almost linear running time and still maintain a provable guarantee. We compare our algorithm with {\sc K-Means++} (a widely studied algorithm) and show that we obtain better accuracy with comparable and even better running time
Streaming Robust Submodular Maximization: A Partitioned Thresholding Approach
We study the classical problem of maximizing a monotone submodular function
subject to a cardinality constraint k, with two additional twists: (i) elements
arrive in a streaming fashion, and (ii) m items from the algorithm's memory are
removed after the stream is finished. We develop a robust submodular algorithm
STAR-T. It is based on a novel partitioning structure and an exponentially
decreasing thresholding rule. STAR-T makes one pass over the data and retains a
short but robust summary. We show that after the removal of any m elements from
the obtained summary, a simple greedy algorithm STAR-T-GREEDY that runs on the
remaining elements achieves a constant-factor approximation guarantee. In two
different data summarization tasks, we demonstrate that it matches or
outperforms existing greedy and streaming methods, even if they are allowed the
benefit of knowing the removed subset in advance.Comment: To appear in NIPS 201
Submodular Maximization Subject to Matroid Intersection on the Fly
Despite a surge of interest in submodular maximization in the data stream model, there remain significant gaps in our knowledge about what can be achieved in this setting, especially when dealing with multiple constraints. In this work, we nearly close several basic gaps in submodular maximization subject to k matroid constraints in the data stream model. We present a new hardness result showing that super polynomial memory in k is needed to obtain an o(k/(log k))-approximation. This implies near optimality of prior algorithms. For the same setting, we show that one can nevertheless obtain a constant-factor approximation by maintaining a set of elements whose size is independent of the stream size. Finally, for bipartite matching constraints, a well-known special case of matroid intersection, we present a new technique to obtain hardness bounds that are significantly stronger than those obtained with prior approaches. Prior results left it open whether a 2-approximation may exist in this setting, and only a complexity-theoretic hardness of 1.91 was known. We prove an unconditional hardness of 2.69
An Efficient Streaming Algorithm for the Submodular Cover Problem
We initiate the study of the classical Submodular Cover (SC) problem in the
data streaming model which we refer to as the Streaming Submodular Cover (SSC).
We show that any single pass streaming algorithm using sublinear memory in the
size of the stream will fail to provide any non-trivial approximation
guarantees for SSC. Hence, we consider a relaxed version of SSC, where we only
seek to find a partial cover.
We design the first Efficient bicriteria Submodular Cover Streaming
(ESC-Streaming) algorithm for this problem, and provide theoretical guarantees
for its performance supported by numerical evidence. Our algorithm finds
solutions that are competitive with the near-optimal offline greedy algorithm
despite requiring only a single pass over the data stream. In our numerical
experiments, we evaluate the performance of ESC-Streaming on active set
selection and large-scale graph cover problems.Comment: To appear in NIPS'1
Fairness in Streaming Submodular Maximization over a Matroid Constraint
Streaming submodular maximization is a natural model for the task of
selecting a representative subset from a large-scale dataset. If datapoints
have sensitive attributes such as gender or race, it becomes important to
enforce fairness to avoid bias and discrimination. This has spurred significant
interest in developing fair machine learning algorithms. Recently, such
algorithms have been developed for monotone submodular maximization under a
cardinality constraint.
In this paper, we study the natural generalization of this problem to a
matroid constraint. We give streaming algorithms as well as impossibility
results that provide trade-offs between efficiency, quality and fairness. We
validate our findings empirically on a range of well-known real-world
applications: exemplar-based clustering, movie recommendation, and maximum
coverage in social networks.Comment: Accepted to ICML 2